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ABSTRACT 

Classical approaches in recommender systems such as col¬ 
laborative filtering are concentrated mainly on static user 
preference extraction. This approach works well as an exam¬ 
ple for music recommendations when a user behavior tends 
to be stable over long period of time, however the most 
common situation in e-commerce is different which requires 
reactive algorithms based on a short-term user activity anal¬ 
ysis. This paper introduces a small mathematical framework 
for short-term user interest detection formulated in terms of 
item properties and its application for recommender systems 
enhancing. The framework is based on the fundamental con¬ 
cept of information theory — Kullback-Leibler divergence. 

Categories and Subject Descriptors 

H. 3.3 [Information Search and Retrieval]: Retrieval 
models 

General Terms 

Algorithms, experimentation, human factors 
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I. INTRODUCTION 

Artificial Intelligence point of view considers a recommender 
system as an agent with user as its environment. Since a user 
is an agent itself it is naturally to assume that by using rec¬ 
ommender systems user usually pursue some personal goals. 
The most general objective of recommender systems is to 
respond accordingly to user behavior and so to his goals. 
However the goals only partially depend on user global pref¬ 
erences. 

When user behavior is determined mostly by global pref¬ 
erences (as in music) the objective degenerates correspond¬ 


ingly. In e-commerce user goals are usually dictated by some 
external reasons unknown to the recommender system. An¬ 
other related difference is that amount of data needed for 
obtaining an adequate estimation of user preferences is usu¬ 
ally far in excess of the same amount for other more special¬ 
ized areas (for example, movie recommendations). These 
two factors make user behavior in e-commerce appear to be 
more depended on short-term personal goals rather than on 
static preferences from the recommender system perspective, 
which justifies value of short-term analysis. 

2. DEFINITIONS AND ASSUMPTIONS 

All recommender systems receive an event flow from each 
user but we consider problem of splitting the event flow into 
sessions solved. 

Definition 1. User session s is defined as finite sequence 
of items user had interaction (usually view) with pursue one 
particular goal: 

s = {i j \ie l}T=i (i) 

where / is set of all items. 

We also introduce set of properties K which is defined for 
each item: 

Mk G K, Mi G / : f(i, k ) G V k (2) 

where V k — possible values of property k. For simplicity V k 
is always a finite set. 

2.1 Model of user behavior 

We consider user as an agent trying to fulfill its own purpose 
and our main assumption is that user actions are dictated 
by his will to find an item with particular set of properties 
U C K (for example, color, size or price). Taking into ac¬ 
count additional assumption about rationality of users we 
can regard session as a trace of some kind of optimization 
and comparison process performed in the partially observed 
environment (items and their descriptions) which points to 
the stochastic nature of the search process. This interpre¬ 
tation of user behavior allows a lot of mathematical models 
which may perfectly fit into the suggested method, but for 
the purpose of the paper we will adhere to one of the sim¬ 
plest: the user session s is viewed as samples of random 
variable ip s with distribution 'k s : 

ij ~ j = 


( 3 ) 


i/j s here defines real user interest within the model with re¬ 
gards to observation limits of recommender systems. 

It should be noted that [3] is also a definition of user session, 
however, in practice the splitting of event flow can be done 
well enough by setting maximal time difference between ad¬ 
jacent events and by a few additional heuristics (for instance, 
an purchase event finalizes current session). 

3. USER INTEREST 

User interest in some property k G K is determined rel¬ 
atively to the common interest in k. Suppose G denotes 
general distribution of items, prior probability of item i G I 
appearing in an event and Gk denotes distribution of values 
of property k. Distributions 'I'l are defined in the similar 
way. 


Definition 2. User interest within session s is the set of 
properties U s : 


u s = {k I n^G k , k G K} (4) 


Of course, in practice 3] is hard to check directly since distri¬ 
bution 4^ is known only approximately A measurement of 
difference between two distributions allows to apply statisti¬ 
cal hypothesis testing and Kullback-Leibler divergence p£][T 
is a natural choice [3] for the test statistical. 


Definition 3. Let P(uj) and Q(co) denote distributions over 
finite space O. Then Kulback-Leibler relative information 
gain of Q from P is: 

Dkl(P|Q)= Y. (V log 50 M (5) 

ueQ ' * ' 


Obviously, in our case: 

n = Gk <^Dkl(TJ I Gk) =0 (6) 

Definition [3] can be reformulated correspondingly [2]. Now 
we can formulate two statistical hypothesis for each k G K 
corresponded to k G U s and k $5 U s : 

Ho : Dkl^ | G k ) = 0 (7) 

Hi : Dkl^ | Gk) > 0 (8) 


and if denotes estimation of the decision rule is fol¬ 
lowing: 


4(s) 


k G U a 
k(£U s 


where A| 


= Dkl 


(&|G fc ). 


if < e r 

otherwise 


(9) 


Since distributions Gk are known in advance, distribution 
of A| under Ho can be also precalculatec0. Authors recom¬ 
mend to do it simply by sampling from Gk since additional 

1 Distribution estimation error is usually quite big since com¬ 
mon user session contains approximately 5-10 events 

2 As an alternative, for example, consider Kolmogorov- 
Smirnov test [5]. 

3 An important moment here is that thresholds ejj 1 2 3 consid¬ 
erably depend on the length m of the session. 


assumptions and modifications may require estimations of 
different from the empirical distribution function which 
may bring unnecessary complications. 

It should be noted, that one of the canonical ways to obtain 
levels £k is by minimizing the risk function, which may be 
quite complicated because end algorithm produces sequence 
of action and so the risk function may involve user-system in¬ 
teraction component. Since the risk function can be directly 
inferred from selected quality function for end algorithm, it 
is much simpler to consider £k as meta-parameters. 


4. ALGORITHM ENHANCING 

The primary aim of short-term interest detection is to en¬ 
hance recommender systems. We consider base recommender 
algorithm !?:/—>• I N defined by weight function w(-): 

R(i) = argtopN w(j) (10) 

IS 

where arg topN is defined analogously to arg max operator. 

Usually the enhancing by considering short-term user inter¬ 
est is reasonable when R(-) is an offline algorithm and does 
not depend on whole session s and the system respond only 
to current event s„J3, however it may depend on long-term 
user history: 

Ru(s) — Ru(Sm') 

where u denotes user whom session s belongs to. 


We demonstrate only a simple example of enhancing: 


c(j) 

R*(s) 


n 

keu 


nu) 

Gk(j) 


arg topN c(j)w(j) 


( 11 ) 

( 12 ) 


where i is the last item in the session and c s (j) is the interest 
coefficient in the item j. 


In a very simple case when w(j) = G(j) c s (j)ui(j) corre¬ 
sponds to estimation of posteriouprobability of item j given 
session s under our model of user behavior. 


Expression for c s (j) and R*(s) should be adopted for the fea¬ 
tures of R(-) once the nature of the weights becomes more 
specific. The expressions ED and ECU reflect probabilistic na¬ 
ture of w(-) when recommendations are based on prior prob¬ 
abilities which then are rescaled to posterior given session s 
as the evidence. 


5. EXPERIMENT 

For the experiment the following model was used: 

n(v) = (l - e“ fe|s| ) /*» + e aM G k (v) (13) 

4 This restriction could be easily expanded, for example, for 
algorithms that take into account sequence of events limited 
by predefined length. The general idea is that if we do not 
want to utilize the same information twice the base algo¬ 
rithm may not widely share its sources with the enhancing 
algorithm. Offline algorithms usually satisfy this require¬ 
ment since it is hard to precalculate recommendations for 
all possible sessions. 

Tf all properties are considered to be independent. 





[5] R. H. Lopes. Kolmogorov-smirnov test. In International 
Encyclopedia of Statistical Science , pages 718-720. 
Springer, 2011. 


The best available proprietary algorithm, cosine similarity 
by statical features, was used as base algorithm. Enhancing 
was performed by [TT] and El To demonstrate importance 
of short-term user interest detection we included two simple 


algorithms for enhancing. 



^static (j? ) — 

cos(/(*), /O')) 

(14) 

Wi(j) = 

1 

(15) 

^popular (j) = 

GU) 

(16) 


Data for the experiment was collected from a e-commerce 
website specialized on appliances and gadgets. This category 
has very rich descriptions (properties) for each item and is 
perfectly suitable for the suggested algorithm in general. 

A simplified version of DCG metric and simple ’hit’ metric 
were used as quality functions. Each user session s (m = |s|) 
was divided into two parts: 

• history: h = [si,..., s m _i] 

• validation: t = s m 

Let n denote recommendation of rank l for session h. In 
this terms the evaluation metrics can be expressed as: 


DCG(IV) 

N , 

rel n 

log 2 (J + l) 

(17) 


N 


Id(JV) 

= rel n 

i=i 

(18) 


where 

1 if t = x 
0 otherwise 

The experiment results are show on figure [T] 
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where /|(u) is frequency of value v in session s, at are 
considered as meta-parameters. The additional smoothing 
is applied in order to bring computational stability and to 
avoid low-frequency problem. It should be noted that the 
optimal at are considerably greater than zero (sa 0.5) for 
our evaluation. 
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Figure 1: Results of the experiment, ’static’ denotes original base algorithm, KLb(-) denotes enhanced algorithm. 







